A simulation study: Using dual ancillary variable to estimate population mean under stratified random sampling

In this paper, we propose an improved ratio-in-regression type estimator for the finite population mean under stratified random sampling, by using the ancillary varaible as well as rank of the ancillary varaible. Expressions of the bias and mean square error of the estimators are derived up to the first order of approximation. The present work focused on proper use of the ancillary variable, and it was discussed how ancillary variable can improve the precision of the estimates. Two real data sets as well as simulation study are carried out to observe the performances of the estimators. We demonstrate theoretically and numerically that proposed estimator performs well as compared to all existing estimators.


Introduction
In sampling theory, appropriate use of the ancillary information may increase precision of the estimators. Numerous authors employed the ancillary information at the designing stage and at the estimation stage. One purpose of sample survey theory is to estimate unknown population parameters of the variable of intrest being studied, such as mean, median, proportion and variance etc. It is preferable to employ stratified random sampling scheme rather than simple random sampling when data is based on hetrogeneous population.
In stratified random sampling, we divide the diverse population into strata or groups which are non-overlapping, and sampling is carried out from each stratum separetely. Zaman and Kadilar [1] when the difference in variance across strata is significantly greater than the difference in variance within strata, stratification enhances efficiency of the estimates. Zaman and Kadilar [2] and Zaman [3] provided an efficient estimators of population mean using the auxiliary varaible in stratified random sampling. Zaman [4] proposed an efficient exponential type estimator for the population mean under stratified random sampling. Rather and Kadilar [5] introduced dual to ratio cum product type of exponential estimator under stratified random sampling. Mradula et al. [6] obtained an efficient estimation of population mean under a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 stratified random smapling with linear cost function. Javed et al. [7] proposed a simulation based study for progressive estimation of population mean through traditional and non-traditional measures in stratified random sampling. Javed and Irfan [8] obtained a simulation based on new optimal estimators for population mean by using the dual auxiliary information in stratified random sampling. Yadav and Tailor [9] estimated a finite population mean using two auxiliary varaibles under stratified random sampling. Zaman and Bulut [10] proposed a modified regression estimator using robust regression methods and covaraince matrics in stratified sampling scheme. Kumar and Vishwakarma [11] proposed a generalized classes of regression-cum-ratio type estimators for population mean under stratified random sampling. Zahid et al. [12] provided a generalized class of estimators for sensitive varaible in the presence of measurment error and non-response under stratified random sampling. Some important references to the population mean under stratified random sampling using the auxiliary information include Aladag and Cingi [13], Grover and Kaur [14], Shabbir and Gupta [15], Khalid [16,17], Kadilar and Cingi [18,19], Koyuncu and Kadilar [20,21], Singh and Vishwakarma [22]. Recnetly Hussain et al. [23] proposed estimation of finite population distribution function with dual use of the anillary infromation under simple and stratified random sampling.
A unique idea for investigating more optimum estimators involving dual use of the ancillary information to deal with the stratified random sampling method has emerged recently. In this paper, we develop a new efficient estimator for finite population mean using dual ancillary variable under stratified random sampling.
Consider a finite population of distinct and identifiable units, A simple random sample of size n h is drawn without replacement from the h th stratum such that P L h¼1 n h ¼ n. Let y, x and r x be the study, auxiliary and rank of the auxiliary variables respectively, assuming valuesy h i ; x h i and r xh i for the i th unit in the h th stratum. Let the stratum means be � and � r xst ¼ P L h¼1 o h � r xh be the sample means of y, x and r x respectively across the strata, where the population means of y, x and r x respectively.
We also define the following error terms. Let ε 0st ¼ � Using these notations, we have be the coefficients of variation of y, x and r x and r y h x h ; r y h rx h , and r x h rx h be the population correlation coefficients between (y h , x h ), (y h , r xh ), and (x h , r xh ) respectively in the h th stratum. Let s 2 yh ¼ , and , and , are the covariance between their respective subscripts.
The rest of the paper is arranged as follows: In Section 2, literature review of various estimators under stratified random sampling is introduced. In Section 3, we proposed estimator for estimating finite population mean under stratified random sampling using dual ancillary variables are defined. In Section 4, theoretical comparison is conducted to assess the performances of the estimators. In Section 5 and 6, numerical investigation and data description are given. Monte Carlo simulation is concluded in Section 7. Discussions of the numerical results are given in Section 8. Finally, concluding remarks are discussed in Section 9.

Literature review
This section studies various stratified estimators that are available in the literature: (i) The traditional mean estimator in stratified random sampling is: The variance of� Y st , is given by (ii) Cochran [24], suggested the traditional ratio� Y RðstÞ estimator: The bias and MSE of� Y R , are given by: and (iii) Murthy [25] suggested the usual product estimator� Y PðstÞ , which is given by: The bias and MSE of� Y PðstÞ , are given by: and (iv) Bahl and Tuteja [26] suggested the following estimators: The biases and MSEs of� Y BT;RðstÞ , and� Y BT;PðstÞ , are given by: (v) The difference estimator� Y dif , is given by: where d is an appropriate constant. The minimum variance of� Y dif ðstÞ at the optimal worth X v 020 , is given as: (vi) Rao [27], proposed the following estimator: where Q 1 and Q 2 are constants.
The bias and MSE of� Y R;DðstÞ , given by: The optimum values of Q 1 and Q 2 are given by: The minimum MSE of� Y R;DðstÞ is given by: (vii) The suggested estimator by Singh et al. [28]: For a = 1, and b = 0, The bias and MSE of� Y Singh , is given by: (viii) The suggested estimator of Grover and Kaur [29], is given as: Where Z 1 and Z 2 are unknown constants. For a = 1 and b = 0 The bias and MSE of� Y GkðstÞ are given by: The optimum values of Z 1 and Z 2 are given as: The minimal MSE of� Y GkðstÞ , are: (ix) Ahmad et al. [30] proposed an improved estimatorŶ Pr st , is given by: Where Q i (i = 5,6,7) are constants.
The bias and MSE ofŶ Pr st , are given by: and MSEŶ Pr st The optimum values of Q 5 , Q 6 and Q 7 are given by: The minimum MSE ofŶ Pr st at optimum values of Q 5 , Q 6 , and Q 7 are given by: where R 2 y:xz ¼

Proposed estimator
Suitable use of the ancillary information may improve the precision of estimators both at the design and estimation stage. The rank of the ancillary variable is correlated with the study variable when the correlation among the study and ancillary variable is strong. In literature, dual use of ancillary variable has been rarely attempted, therefore we motivated towards it. The principal advantage of our proposed ratio-in-regression type estimator under stratified random sampling is that it is more flexible, efficient than the existing estimators. Taking motivation from Ahmad et al. [30], we propose ratio-in-regression type exponential estimator for estimating the population mean under stratified random sampling.
Where Q 11 , Q 12 and Q 13 are unknown constants.
Solving� Y ssðstÞ given in Eq (22), The optimum values of Q 11 , Q 12 and Q 13 are given by: Putting the optimum values of Q 11 , Q 12 and Q 13 in (23), we get the minimal mean square error of� Y ssðstÞ , given by:

Data discription
To show the efficiancy of proposed estimator over the existing estimators, we conduct a numerical study to investigate the performances of the propose and existing estimator. For this purpose, we used two real data sets of Kadilar and Cingi (2003), summary statistics given in Tables 1 and 2 for the population-I while MSE and PRE are presented in Table 3, similarly summary statistics for population-II given in Tables 4 and 5 while MSE and PRE are presented in Table 6. For population-I study variable is apple production in 1999, and the auxiliary variable is the number of apple trees in 1999. Similarly for population-II the study variable is apple production in 1999, and the auxiliary variable is the number of apple timber in 1998. (Source: Institute of Statistics, Republic of Turkey). We have stratified the data by regions of Turkey such as (1: Marmara, 2: Agean, 3: Miditerranean, 4: Central Anatolia, 5: Black sea, 6: East and Southeast Anatolia) and from each stratum, we have randomly selected the samples whose sizes are computed by using the Neyman allocation method. Population I: (Source: Kadilar and Cingi [18]) Y is the crop of apples in 1999, and X is the number of apples timber in 1999. Population 2: (Source: Kadilar and Cingi [18]) Y is the crop of apples in 1999, and X is the crop of apples trees in 1998.

Simulation study
The efficiency of proposed estimators over competing estimators was demonstrated clearly in the preceding section. A Monte Carlo simulation analysis with R software is also used to assess the efficiency of the proposed estimator using dual ancillary variable under stratified random sampling. The assessment of proposed estimator with existing estimators is illustrated using the percentage relative efficiency (PRE) formula. Yet again, the real population of Kadilar and Cingi [18] is used. The following steps are used in R-Language software to conduct the simulation study.
2. With stratified sampling, the technique is repeated 100,000 times and the population is divided into six strata to calculate the numerous values of proposed and existing estimators.  Table 7.
The consequence of the above results, the performance of the proposed estimator is the best among all the existing estimators under consideration.

Advantages of Monte Carlo simulation
The basis of a Monte Carlo simulation is that the probability of varying outcomes cannot be determined because of random variable interference. Therefore, a Monte Carlo simulation focuses on constantly repeating random samples to achieve certain results. A Monte Carlo simulation takes variable that has uncertainty and assigns it a random value. The model is then run, and a result is provided. The process is repeated again and again while assigning the variable is question with many different values. Once the simulation is complete, the results are averaged together to provide an estimate.

Discussion
To evaluate the advantage of our propose estimator under stratified random sampling, we use two real data sets for numerical comparision. On the basis of numerical results, which are presented in Tables 3 and 6, it is observed that the proposed ratio-in-regression type estimator are more efficient than the usual sample mean estimator, Cochran [24], Murthy [25], Bahl and Tuteja [26], difference estimator, Rao [27], Singh et al. [28], Grover and Kaur [29], Ahmad et al. [30]. Table 7 gives simulation results for the percentage relative efficiency of proposed estimator w.r.t the existing estimators by using different sample sizes i.e: 64, 80, 96, 128, 144, 160, 176, 192, 208, and 240. The value of percentage relative efficiency differs depending on the sample size. From the simulation results, it is also observed that the proposed estimator is more efficient than the existing counterparts, in terms of percentage relative efficiency. As we increase the sample size, the efficiency of our proposed estimator is also increased. Overall, the gain in efficiency of our proposed estimator is the best as compared to all existing counterparts.
For visualization, the comparison of proposed estimator with existing estimators in terms of percentage relative efficiency are presented in Fig 1. The length of a line graph is directly associated with the efficiency of an estimator. More specifically, the higher the length of a line graph, efficient the estimator. In general, we recommend using our proposed estimate for the new survey instead of the existing estimator examined in this paper for estimating the finite population mean under stratified random sampling.

Conclusion
In this article, we propose ratio-in-regression type exponential estimator for the finite population mean under stratified random sampling, which required an ancillary variable on the sample mean and rank of the ancillary varaible. Expressions for mean square error of the proposed estimator are derived up to first order of approximation and comparison is made with the estimators mentioned herein. According to results of real data sets, it is perceived that the proposed estimator performs well as compared to its existing counterpart. A simulation analysis is also carried out to assess the robustness and generalizability of the propose estimator. The simulation study's findings also confirm the utility of the proposed estimator. A numerical study is carried out to support the theoretical results. Therefore we recommend the use of proposed estimators for efficiently estimating the finite population mean under stratified random sampling. The current work can be extended to develop an improved class of estimators under two-phase, non-response, two-stage, and cumulative distribution function sampling scheme using information on ancillary variable for estimating the population mean under simple and stratified random sampling.